The plot here shows wine reviews from the dataset available from https://www.kaggle.com/zynicide/wine-reviews/ and compares them to the price of the wine.
wine <- read.csv("~/R/DevelopingDataProducts/wine-reviews/winemag-data-130k-v2.csv")
August 7, 2018
The plot here shows wine reviews from the dataset available from https://www.kaggle.com/zynicide/wine-reviews/ and compares them to the price of the wine.
wine <- read.csv("~/R/DevelopingDataProducts/wine-reviews/winemag-data-130k-v2.csv")
The dataset is filtered to only show wines which had a price listed, and whose price was not an outlier in the 1.5*IQR sense. Then 500 of these wines were randomly selected to show a graph that was not too busy.
wine <- wine[!is.na(wine$price),]
wine <- filter(wine,price < 1.5 * IQR(wine$price) +
quantile(wine$price,.75))
set.seed(16)
wine <- wine[sample(nrow(wine),500),]
The plot is then created using ggplotly (part of the `plotly' package).
p <- ggplot(data = wine, aes(x=price,y=points)) +
geom_point() +
geom_smooth() +
labs(x = "Price(USD)", y = "Points",
title = "Wine Quality vs. Price")
gg <- ggplotly(p)
gg